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Abstract. We systematically investigate the complexity of model checking the existential 
, positive fragment of first-order logic. In particular, for a set of existential positive sentences, 

we consider model checking where the sentence is restricted to fall into the set; a natural 
question is then to classify which sentence sets are tractable and which are intractable. 
With respect to fixed-parameter tractability, we give a general theorem that reduces this 
■ classification question to the corresponding question for primitive positive logic, for a variety 

of representations of structures. This general theorem allows us to deduce that an existential 
positive sentence set having bounded arity is fixed-parameter tractable if and only if each 
sentence is equivalent to one in bounded- variable logic. We then use the lens of classical 
| complexity to study these fixed-parameter tractable sentence sets. We show that such a 

^3 ■ set can be NP-complete, and consider the length needed by a translation from sentences 

in such a set to bounded-variable logic; we prove superpolynomial lower bounds on this 
length using the theory of compilability, obtaining an interesting type of formula size lower 
bound. Overall, the tools, concepts, and results of this article set the stage for the future 
consideration of the complexity of model checking on more expressive logics. 

(N ' 
O 

Q\ ' 1 Introduction 

rn ' 

' Background. Model checking, the computational problem of deciding if a logical sentence holds 

on a structure, is a fundamental task that is ubiquitous throughout computer science. Witness its 
appearance in areas such as logic, artificial intelligence, database theory, constraint satisfaction, 
and computational complexity. It is well-known to be intractable in general: for first-order logic on 
finite structures it is PSPACE-complete. Indeed, the natural bottom-up algorithm for evaluating 
a first-order sentence ^ on a finite structure B can require time \B\ m ^\ where \B\ is the size of 
the universe of B, and rn(<p) denotes the maximum number of free variables over subformulas of 
<t>. This general intractability, coupled with the natural exponential dependence on the sentence, 
prompts the pursuit of restricted classes of sentences on which model checking is tractable. 

Certainly, one can pursue such tractable fragments with respect to the classical and well- 
established notion of polynomial-time tractability. However, as has been articulated in the litera- 
ture, the typical situation in practical database settings is the evaluation of a short query against 
a large database, or, in logical parlance, evaluating a short formula on a large relational structure 
(see for example the discussion of Grohe, Schwentick, and Segoufin [H]). This suggests that one 
might relax the definition of polynomial-time tractability by requiring the running time to ex- 
hibit a polynomial dependence solely on the database, and allowing arbitrary dependence on the 
formula. Relaxing polynomial-time tractability so that arbitrary dependence in some parameter 
is tolerated yields, in essence, the notion of fixed-parameter tractability. This notion is the base 
tractability notion of parameterized complexity theory, an alternative framework for classifying 
the complexity of problems. 

Within relational first-order logic, there is currently a mature understanding of model checking 
on primitive positive logic, which consists of the first-order formulas built from atoms, conjunc- 
tion, and existential quantification. Let J 7 be a set of primitive positive sentences having bounded 
arity, by which is meant that there is a constant upper bounding the arity of all relation symbols 
appearing in the sentences. It is known that model checking on T is tractable, for either of the 
tractability notions discussed, if and only if there exists a constant k > 1 such that each sentence 



T is logically equivalent to a sentence using k (or fewer) variables. This result is due to Dalmau, 
Kolaitis, and Vardi [5] and Grohe [13], and is proved under typical complexity-theoretic assump- 
tions. This tractability condition is clearly related to and can be viewed as an extension of Vardi's 
classic observation [22] on the tractability of bounded- variable first-order logic: for each k > 1, 
model checking on first-order logic limited to k variables is polynomial-time tractable, via the 
natural bottom-up evaluation algorithm. Note that there are many possible ways to represent the 
relations of structures; in the case of bounded arity, reasonable representations will be equivalent, 
but when the arity is unbounded, the complexity of a sentence set may be sensitive to the repre- 
sentation used. (See Chen and Grohe [5] and Marx |20ll9j for work on model checking primitive 
positive sentences having unbounded arity, with respect to various representations.) 

In this article, we study the complexity of model checking in existential positive logic, by which 
we mean the extension of primitive positive logic where disjunction is permitted: a formula is ex- 
istential positive if it is built from atoms, conjunction, disjunction, and existential quantification. 
This is a natural restriction of first-order logic which is/has been of primary interest in a number of 
studies. For instance, it is reported by Abiteboul, Hull, and Vianu [I] that so-called unions of con- 
junctive queries, also known as select-project-join-union queries, are the most common queries to 
databases; these queries are semantically equivalent to existential positive formulas. Also, existen- 
tial positive logic has been the subject of focused investigation in finite model theory in connection 
with understanding the status of the homomorphism preservation theorem on (classes of) finite 
structures; see for example [412 1) and the references therein. To the best of our knowledge, there 
has not previously been any systematic study of the complexity of model checking fragments of 
existential positive logic; this state of affairs is surprising, given the level of attention that this 
logic has received in other settings. We here aim to remedy this gap in the literature. 

Overview of results. We now turn to give an overview of our results; the reader is advised to 
refer to the technical sections of the paper for precise statements. First, we identify the notion 
of a usable representation of structures; this notion requires some very mild assumptions to hold 
on the representation. We show that, relative to a fixed usable representation of structures, a 
classification of the sets of existential positive sentences that are fixed-parameter tractable can be 
derived from a classification of the sets of primitive positive sentences that are fixed-parameter 
tractable (Theorem |9]) . In essence, for each usable representation, we reduce the classification of 
fixed-parameter tractable fragments of existential positive logic to the corresponding classification 
for primitive positive logic. This allows us to deduce a classification in the case of bounded arity, 
whose statement is virtually identical to the corresponding statement for primitive positive logic: 
let J 7 be a set of existential positive sentences having bounded arity; model checking on T is 
fixed-parameter tractable if and only if there exists a constant k > 1 such that each sentence in T 
is logically equivalent to a fc-variable sentence (Theorem ITOl and Proposition [TT]). As before, this 
is under typical complexity-theoretic assumptions, and one can again appreciate the reminiscence 
of Vardi's observation. 

Having obtained a description of the fixed-parameter tractable sentence sets, under bounded 
arity, we then study sentence sets using the lens of classical complexity. (For ease of discus- 
sion, let us assume here and in the rest of this introduction that sentence refers to an existen- 
tial positive sentence.) We show that there are fixed-parameter tractable sentence sets that are 
NP-complete; in particular, we show NP-completeness for the set of sentences that are logically 
equivalent to 2- variable sentences (Proposition IT2"j) . We thus observe a divergence between fixed- 
parameter tractability and polynomial-time tractability that does not occur in primitive positive 
logic. 

For a sentence set witnessing this divergence, that is, a sentence set T that is fixed-parameter 
tractable but NP-complete, each sentence therein has a logically equivalent, bounded-variable 
sentence, but this equivalent sentence must be in general hard to compute; for, if it was easily 
(polynomial-time) computable, model checking on T would be in polynomial time by Vardi's 
observation-a contradiction! It is worthwhile to diagnose this situation by investigating why the 
equivalent bounded-variable sentences are hard to compute, given their computational usefulness 
and their potential as a target format in which to preprocess queries. We carry out such a diagnosis 



and demonstrate that, with respect to various signatures, two dramatically different reasons can 
underlie the divergence (Theorem [TH and Corollary ITST) . 

— On a signature consisting of finitely many unary relation symbols, we show that each sentence 
has an equivalent bounded-variable sentence of constant length. 

— On all other signatures, we prove that there exists a sentence set T (witnessing the divergence) 
such that there is no translation from a sentence in T to an equivalent bounded-variable 
sentence of polynomial length. That is, there is no way to preprocess the .F-sentences into 
bounded- variable sentences without increasing their length superpolynomially. 

Intuitively speaking, in the first case, short bounded- variable sentences exist, but they are difficult 
to compute; in the second case, short (polynomial-length) bounded-variable sentences do not exist, 
so there is no sense in even asking about computing them in polynomial time. The latter result 
is proved in a general form that implies that there is no polynomial-length translation to any 
format that allows for polynomial-time query evaluation (see the statement of Theorem 1 14[) , and 
gives a formal limit on the extent to which the original sentences can be preprocessed/compiled. 
This result is proved by using notions and techniques from the theory of compilability developed 
by Cadoli et al. [5], and is proved under the assumption that the polynomial hierarchy does not 
collapse. Although this result is proved under a complexity-theoretic assumption, we believe that 
it constitutes an interesting formula size lower bound and can be taken as a contribution to the 
literature on formula size lower bounds (see for example |3ll2j ). 

Our study thus yields a fundamental understanding of the complexity of model checking exis- 
tential positive logic. We make a methodological contribution by employing the theory of compil- 
ability to link classical complexity with parameterized complexity, in particular, to gain an under- 
standing of the sentence sets that are simultaneously fixed-parameter tractable and NP-complete; 
this linking could be of independent interest and of utility for analyzing other computational prob- 
lems. Overall, the tools, concepts, and results of our work set the stage for the future consideration 
of the complexity of model checking on more expressive logics. 

2 Preliminaries 

Structures. In this paper, we consider only relational structures. A signature is a set of relation 
symbols; each relation symbol has associated to it a finite arity k > 1. A relation symbol of arity 1 
is said to be unary. A structure B over a signature er consists of a universe B, which is a non-empty 
set denoted with the letter of its structure in non-bold typeface, and a relation i? B C B k for each 
relation symbol R; here, k denotes the arity of R. 

A collection of structures is said to be similar if they share the same signature. Let A, B be 
similar structures on the signature a. A homomorphism from A to B is a mapping h : A — > 
B such that for each symbol R £ a, it holds that h(R A ) C i? B , by which is meant that for 
each tuple {a\, . . . , au) € one has (h(ai), . . . , h{ak)) £ i? B . We will sometimes simply write 
A — > B to indicate that there exists a homomorphism from A to B. We say that A and B are 
homomorphically equivalent if A — > B and B > A both hold. The structure B is a substructure 
of the structure A if B C A and i? B C R for all relation symbols R. When B is a substructure 
of A, there exists a homomorphism h from A to B, and h fixes each element b £ B, the mapping 
h is said to be a retraction from A to B; when there exists a retraction from A to B, it is said 
that A retracts to B. A core of the structure A is a structure C such that A retracts to C, 
but A does not retract to any proper substructure of C. We will make use of the following well- 
known facts on cores [15]: (1) each finite structure has a core; (2) all cores of a finite structure are 
isomorphic. From these facts, it is reasonable to speak of the core of a finite structure, which we 
do, and we use core(A) to denote a representative from the set of all cores of a finite structure 
A. The product of A and B, denoted by A x B, is the structure with universe A x B and where 
i? AxB = {((oi.ftO,..., (o fc ,6fc)) I (oi,...,a fc ) e R A ,(b 1 ,...,b k ) e i? B } for each symbol R. We 
will make use of the following well-known fact concerning products. 

Proposition 1. Let A, B, and B' be similar structures. There are homomorphisms A > B and 
A — » B' if and only if there is a homomorphism A 4 B x B'. 



Formulas. In this paper, we study relational first-order logic and fragments thereof. An atom (over 
signature a) is an equality of variables x — y or a predicate application R(xx, . . . , Xk), where R G a 
and R has arity k. A formula (over signature a) is built from atoms (over a), negation, conjunction, 
disjunction, existential quantification, and universal quantification. An existential positive formula 
is a formula built from atoms, conjunction, disjunction, and existential quantification. A primitive 
positive formula is a formula built from atoms, conjunction, and existential quantification. We use 
FO to denote the set of all first-order formulas, EP to denote the set of all existential positive 
formulas, PP to denote the set of all primitive positive formulas, and V PP to denote the set of 
all formulas that are disjunctions of primitive positive formulas. For each k > 1, we use FO fe to 
denote the subset of FO containing formulas that use k (or fewer) variables, and we define EP fc , 
PP fc , and (V PP) fc analogously. For a signature a, we add a a subscript to the notation for a set of 
formulas to indicate a restriction to those formulas over er; for instance, FO^ denotes the formulas 
m FO fc that are over signature a. A sentence is a formula having no free variables. 

Each structure A naturally induces a primitive positive sentence: letting {<2i, . . . ,a n } denote 
the elements of A, we define the sentence Q[A], called the canonical query of A, to be 

3ai...3a„/\ /\ R(ax, . . . , a k ). 

fleer (ai,...,a fc )efl A 

(Note that if the quantifier-free part is empty, one can insert an equality a = a for some a £ A to 
make it non-empty.) We have the following classical theorem. 

Theorem 2. (Chandra- Merlin Jfjj]) Let A, B be similar finite structures. The following are equiv- 
alent: 

— There is a homomorphism A > B. 

— B |= Q[A]. 

— Q[B] hQ[A]. 

One can also naturally pass from a primitive positive sentence to a structure, as follows. Convert 
the primitive positive sentence ip to prenex normal form. Then, eliminate equalities as follows: an 
equality a = a on a single variable is simply removed; for an equality a = a' on distinct variables, 
replace all instances of a' with a in the quantifier- free part, and then follow the removal process 
for an equality on a single variable. Define C[tp] to be the structure having a universe element 
for each existentially quantified variable in the resulting object, and where, for each R G a, the 
relation R c ^ contains (ax, . . . , ak) if and only if R(ax, ■ ■ ■ , ak) appears in the quantifier-free part 
of the resulting object. It is straightforward to verify that each any primitive positive sentence ip 
is logically equivalent to Q[C[V>]], although these sentences may be syntactically different, due to 
the elimination of equalities in the just-described conversion. Similarly, it can be verified that each 
finite structure A is homomorphically equivalent to C[Q[A]]. One can then derive consequences of 
Theorem [5] such as that for any primitive positive sentence ip and any structure B, the condition 
B |= -0 is equivalent to the condition C[tp] — > B. We will make use of such consequences. 
We have the following basic proposition on existential positive sentences. 

Proposition 3. There exists a computable mapping M that associates, to each existential positive 
sentence cp, a non-empty finite set M((f>) of primitive positive sentences such that the following 
properties hold. 

— The sentence cp is logically equivalent to the sentence V^6M(0) 

— For any two distinct sentences ip,ip' € M ((/>), it holds that tp ^= ip' . 

Proof. We describe the action of M. First, the sentence <j> is converted to a disjunction cj>' of prim- 
itive positive sentences by induction, via the syntactic transformations 3x(\J i a{) ^ \J ^(Bxcti) and 
(Vi a i) A (Vj A/) V 'i j( a i A Pj)- We define an equivalence relation on primitive positive sen- 
tences appearing in <j)' (as disjuncts): two such sentences a, /3 are equivalent if and only if they are 
logically equivalent. Note that this equivalence relation can be computed, since by Theorem[5]and 



the surrounding discussion, a,/3 are logically equivalent if and only if C[a] and C[/3] are homo- 
morphically equivalent; the latter condition is clearly computable. Let us say that an equivalence 
class F is extremal if when a G F and f3 is a primitive positive sentence in (/>', it holds that a \= f3 
implies (3 G F. Again, by Theorem [2j it can be computed if an equivalence class is extremal. 
Define M((j>) to be a set that contains one representative from each extremal equivalence class. It 
is straightforward to verify that M(cf>) has the desired properties. □ 

Treewidth. A tree decomposition of a structure B is a pair (T, j3) consisting of a tree T and a 
map j3 : V T — > p(B) that associates each vertex t of T with a non-empty subset j3(t) of B, called 
the bag of t, such that the following conditions hold: 

— For each b G B, the vertices {t | b G j3(t)} form a connected subtree of T. 

— For each tuple (&i, . . . , bk) appearing in a relation of B, there exists a vertex t G V T such that 

{&!,...,&*} C 0(f). 

The width of a tree decomposition (T,/3) is defined as (max te yr |/3(f)|) — 1. The treewidth of a 
structure B, denoted by tw(B), is the minimum width over all tree decompositions of B. We have 
the following theorem relating the treewidth of a structure to bounded- variable primitive positive 
logic. 

Theorem 4. (follows from Theorem 12]) Let A. be a structure, and let k > 1. The following 
are equivalent: 

— It holds that tw(core(A)) < k. 

— The primitive positive sentence Q[A] is logically equivalent to a sentence in PP fc . 

Parameterized complexity. We overview the elements of parameterized complexity that will be 
used in the paper, and refer the reader to the book by Flum and Grohe [10J for more information. 

Throughout the paper, we use £ to denote an alphabet used to encode objects. A parameteri- 
zation is a polynomial-time computable mapping k that maps each string x G S* to a parameter 
k(x). A parameterized problem is a pair (Q,k) consisting of a decision problem Q C S* and a 
parameterization k. 

A mapping g defined on S* is said to be non-uniformly fixed-parameter tractable (nuFPT) 
with respect to a parameterization k if there exist a function / and a polynomial p (both over the 
natural numbers) such that for every parameter k, there exists an algorithm that computes g 
on {x G S* | k(x) = k} in time bounded above by f (n(x))p(\x\) . A mapping g defined on S* is 
said to be fixed-parameter tractable (FPT) with respect to a parameterization k if there exists a 
single algorithm A that can, for every k, play the role of A^ in the definition of nuFPT. A decision 
problem (Q, k) is in nuFPT if the characteristic function of Q is nuFPT with respect to k, and is 
in FPT if the characteristic function of Q is FPT with respect to k. 

Let (Q, k), {Q 1 , k!) be parameterized problems. An nuFPT (respectively, FPT) reduction from 
(Q,k) to (Q',k') is an nuFPT (respectively, FPT) mapping g such that (1) for all x G S*, it 
holds that x G Q if and only if g(x) G Q' , and (2) for each k, the set n'{g{{x \ n(x) = k})) is 
finite. A FPT Turing reduction from (Q, n) to (Q' , «/) is an FPT mapping g that can pose oracle 
queries to Q' such that (1) g computes the characteristic function of Q, and (2) for each k, the set 
k'(0({x I k(x) — k})) is finite, where 0(x) denotes the set of oracle queries to Q' made by g on 
input x. 

We will make use of the following facts. 

Proposition 5. The composition of an nuFPT reduction from (Q,k) to (Q',k') and an nuFPT 
reduction from (Q',k') to {Q", k") is an nuFPT reduction from (Q,k) to {Q",k"). 

Proposition 6. The class FPT is closed under FPT Turing reductions. 

The parameterized complexity class W[l] is often said to be the analog of NP in the world of 
parameterized complexity, and it is widely believed that W[l] is not contained in FPT. 



Computational problems. In this paper, we study model checking problems, which involve 
deciding if a sentence is true on a structure. There are different ways that structures can be 
represented, in particular, there are different ways that their relations can be represented, and the 
representation used can impact the complexity of model checking [S\ . We will show a classification 
result that holds on a wide class of representations. Formally, by a representation, we mean a map 
r from S* to the class of finite structures. One representation that we will study is the explicit 
representation, where a relation is represented by an explicit listing of the tuples that it contains; 
we refer the reader to [IT] for a discussion of such a representation. Note that when the arity is 
bounded, reasonable representations will be equivalent under polynomial-time translations to the 
explicit representation, and hence also to each other. 

Let r be a representation, and let J 7 be a set of formulas. We define EP-MC r (J r ) to be the 
problem of deciding, given a pair (<f>, x) consisting of an existential positive sentence <j> £ J- and a 
string x representing a finite structure r(x) — B over the signature of <f>, whether or not B j= <j>. 
(Note that when we discuss a problem of the form EP-MC r (J r ), typically, all formulas in T will 
be existential positive.) The problem PP-MC r (_F) is defined similarly, but with respect to the 
primitive positive sentences in J- . Finally, the problem FO-MC r (J r ) is defined with respect to all 
sentences in T . Omission of the r subscript, for example, the notation EP-MC(J r ), denotes the 
respective problem under the explicit representation. We will sometimes view these problems as 
parameterized problems; such viewing is always respect to the parameterization x) = <fi. 

We have the following previous results on the complexity of the problems PP-MC(J r ). 

Theorem 7. (Dalmau, Kolaitis, and Vardi J$jj) Let T be a set of primitive positive sentences 
such that the set of structures {core(C [?/>]) | ip G has bounded treewidth. Then, the problem 
PP-MC(7 r ) is polynomial-time decidable, and hence fixed-parameter tractable. 

Theorem 8. (Grohe \13fj) Let J- be a set of primitive positive sentences having bounded arity 
such that the set of structures { core (C [■;/>]) | ip 6 J-} has unbounded treewidth. Then, the problem 
PP-MC(7 r ) is W[l]-hard under nuFPT reductions. 

3 Parameterized Complexity Classification 

In this section, we give a general theorem that, with respect to certain representations, allows 
one to derive a parameterized complexity classification of the problems EP-MC r .(J 7 ) from a cor- 
responding classification of the problems PP-MC r (7 7 ). This general theorem will be established 
under mild assumptions on the representation, formalized by the following definition. We say that 
a representation r is usable if the following two conditions hold. 

— There exists a computable mapping t : U* — ¥ U* where for each string x £ S* representing 
a structure A in the explicit representation, the string t{x) represents A under r, that is, 
r{t{x)) = A. 

— There exists a mapping p : S* x S* — > S* that is FPT with respect to the parameterization 
iri(x,y) = x such that for all strings x,y G £*, it holds that the structure r(p(x,y)) is equal 
to the product structure r(x) x r(y). 

The first condition posits the existence of a translation from the explicit representation to the 
representation r, and the second condition asserts the existence of a product mapping that maps 
two input strings to a string that represents the product of the structures represented by the input 
strings. It is straightforward to verify that the explicit representation is usable; we now turn to 
look at another example representation. 

Example 1. Previous work 8 studied a representation of relations called the generalized DNF 
( GDNF) representation. A GDNF representation of a relation T C B k is an expression of the 
form Ui=i(^ii x • • • x Pik) where each Py is a subset of B. This representation is readily seen to 
be a natural generalization of the DNF representation of relations on the Boolean domain. We let 
g denote the representation of structures where relations are presented in GDNF form. 



We briefly verify that the representation g is usable. A relation {(&n, . . . , &ifc), ■ • • , (b m \, • ■ • , &mfe)} 
presented in the explicit representation can be readily translated to the GDNF representation 
U™ i({ & ii} x • • • x {b ik }). Given two relations U2=i(-P»i x • • • x P ik ), U"=i(Qii x • • • x Q jk ) in the 
GDNF representation, their product is equal to the relation represented by 

rn n 

(J |J {(Pa x Qji) x • • • x (P ik x Q jk )). 
i=ij=i 

This GDNF representation of the product can be computed in time polynomial in the sum of the 
two input representations' lengths, and hence computing this product has the desired FPT prop- 
erty. (One can note, indeed, that when the first input relation is fixed, the product representation 
has length linear in the length of the second input relation's representation.) 

It is known that, for a set T of primitive positive sentences, the problem PP-MC g (J r ) is FPT if 
the structures corresponding to T (that is, the set of structures C[J-"]) have bounded incidence width 
modulo homomorphic equivalence, and that this problem is W[l]-hard under nuFPT reductions 
otherwise. This follows from results in [8 , to which we refer the reader for more details. This 
classification result, along with the theorem that follows, allows one to obtain a full FPT/W[1]- 
hard classification of the problems having the form EP-MC g (J r ). □ 

The following is our parameterized complexity classification theorem; M denotes the mapping 
that is the subject of Proposition [3J 

Theorem 9. Let r be a representation. For each set T of existential positive sentences, the set 
T' = Urt^jr M((f>) of primitive positive sentences has the following properties. 

— The problem EP-MC r (J-") FPT Turing reduces to the problem PP-MC r (J-"'). 

— The problem PP-MC r (J 7 ') nuFPT reduces to the problem EP-MC r (J r ), under the assumption 
that r is usable. 

Proof. For the first property, we make use of Proposition [3] The FPT Turing reduction, given 
an instance of EP-MC r (J r ) representing the query B |= <f>, computes M(<p), and then, for each 
ip G M(4>), queries the problem PP-MC r (J r ') to determine if B |= ip. The reduction answers yes if 
and only if one of the queries was answered yes. 

For the second property, let ip G J-'; we define an algorithm as follows. Fix <p G J- such 
that ip G M(<p). Given an instance (ip,x) where x represents B, the algorithm computes an 
instance (0, y) where y represents C[ip] x B. In particular, the string y is computed as p(t(C[ip]), x), 
where the structure C[ip] is passed to t in the explicit representation. By the definition of usable 
representation, the ensemble of algorithms {A^} is nuFPT. 

We verify the correctness of the reduction as follows. We will make appeals to Chandra-Merlin, 
by which we mean Theorem [3 and the surrounding discussion. First, we claim that if ip' G M((f>) 
and if)' ^ ip, then C[tfj] x B ^= tp'. We prove this by contradiction; suppose that C[ip) x B |= ip' . By 
Chandra-Merlin, it follows that there is a homomorphism C[ip'] — > C[ip) x B. From Proposition [TJ 
it follows that there is a homomorphism C\ip'} — >• C [ip]; we obtain ip \= ip' by Chandra-Merlin, 
which contradicts the description of M((p) (Proposition^. We then have the following equivalences 
and justifications: 

C[ip] X B |= 4> -o- C[ip] X B |= ip (just-established claim) 

<^ C[ip] C[tp] x B (Chandra-Merlin) 

C[ip] -)■ B (Proposition [1] and C[ip] ->■ C[ip]) 

B |= ip (Chandra-Merlin) 

The reduction is thus correct. □ 

We now look at the explicit representation under bounded arity. Let us define the tractability 
condition on a set J- of existential positive sentences to be the condition that the set of structures 



|J {core(CM) | ^ G M(tf>)} 
<$>eT 



has bounded treewidth. Under the assumption of bounded arity, the tractability condition is the 
sole explanation for fixed-parameter tractability; this is shown by the following theorem, which 
gives a comprehensive classification of the sentence sets T such that EP-MC(J r ) is fixed-parameter 
tractable. 

Theorem 10. Let T be a set of existential positive sentences. If J- satisfies the tractability con- 
dition, then EP-MC(J-~) is fixed-parameter tractable; otherwise, under the assumption that T has 
bounded arity, EP-MC(J r ) is W[l]-hard under nuFPT reductions. 

Proof. We make use of Theorem[9]to establish both complexity results; let F' be as defined there. 
For tractability, it suffices to show that PP-MC(7 r ') is FPT. This follows from the definition of the 
tractability condition and Theorem [7] For the hardness result, if the tractability condition does 
not hold, we have W[l]-hardness of PP-MC(J 7 ') under nuFPT reductions by Theorem El and so 



We now observe that the tractability condition can be alternatively characterized as logical 
equivalence to bounded- variable sets of formulas. 

Proposition 11. Let IF be a set of existential positive sentences. The following are equivalent: 

(1) The set T satisfies the tractability condition. 

(2) There exists k > 1 such that each sentence in T is logically equivalent to a sentence in (\f PP) fc . 

(3) There exists k > 1 such that each sentence in T is logically equivalent to a sentence in EP fc . 

Proof. The equivalence between (1) and (2) follows from Theorem|U The implication (2) — > (3) is 
immediate. To establish the implication (3) — > (2), consider the conversion from existential positive 
sentences to disjunctions of primitive positive sentences given in the proof of Proposition [3] the 
syntactic transformations given there do not change the set of variables used. □ 

4 Compilability 

We saw that, in the setting of bounded arity, equivalence of a sentence set to a bounded- variable 
fragment of existential positive logic is both necessary and sufficient for fixed-parameter tractability 
(of the sentence set). For each k > 1, we use wEP fe to denote the set that contains an existential 
positive sentence if and only if it is logically equivalent to a sentence in EP fc . That is, «EP fe is the 
closure of EP fc under logical equivalence within the set of all existential positive sentences. Using 
this notation, the tractability condition on a sentence set JF is equivalent to the condition that 
there exists k > 1 such that J- is contained in «EP fc (by Proposition [TTj) . 

What about the classical notion of polynomial-time tractability? While polynomial-time tractabil- 
ity and fixed-parameter tractability coincide in primitive positive logic (Theorems [7] and [8]) , the 
picture is markedly different in existential positive logic. Let a be a signature; we now present a 
result that shows that, other than a degenerate case where the signature consists of one unary 
symbol, model checking the fragment ~EP^. is NP-hard; this contrasts with the fixed-parameter 
tractability of «EP^. 

Proposition 12. Let a be a non-empty signature. 

— If a consists of one unary symbol, then EP-MC(EP(j) is polynomial-time decidable. 

- Otherwise, EP-MC(«EP^) is NP-complete. 

Proof. Suppose that a = {R} consists of one unary symbol. Let (</>, B) be an instance of EP-MC(EP cr ). 
If R B is empty, then the sentence </> is clearly false on B. Suppose i? B is non-empty. In this case, 
fix a value b £ i? B ; setting all existentially quantified variables to b makes each atom true, and 
hence makes the sentence <f> true (on B). Deciding if <f> is true on B can thus be carried out by 
checking if i? B is non-empty. 



the result follows from Theorem [9] 



□ 



Suppose that a contains at least two symbols, and let T and F be symbols in a. We show 
NP-hardness by reducing from Boolean CNF satisfiability; suppose we are given an instance <fi of 
this problem with variables v\, . . . , v n . Define B to be a structure with universe B — {0, 1} such 
that T B = {(1, . . . , 1)} and F B = {(0, . . . , 0)}. For each clause C of <j), we can define a conjunction 
of atoms C over a that has the same satisfying assignments (over B). For instance, the clause 
C = (vi V -nv 2 V Vi) can be translated to C = T(vx, . . . , ui) V F(v 2 , . ■ . , v 2 ) V T(v 4 , va). The 
resulting instance of EP-MC(?aEP 2 ) is (0', B) where </>' = 3ui . . . 3v n f\ c C'\ here, the conjunction 
is over each clause C of <fi- We verify that 4>' is contained in wEP 1 (and hence wEP 2 ), as follows. 
Convert <f>' to the disjunction of primitive positive sentences using the syntactic transformations 
given in the proof of Proposition [3] Each resulting primitive positive sentence can be viewed as 
having the form 3t> 1 . . . 3v n {^\ A ... A 7„) where ji is the (possibly empty) conjunction of atoms 
using just the variable Vi. Such a sentence is logically equivalent to (3«i7i) A • • • A (3w„7 n ), which 
in turn is logically equivalent to a 1-variable sentence under renaming of variables. 

The remaining case is that a contains one symbol S of arity k > 2. In this case, we define B to 
be the structure with universe B = {0, 1} and S B = {(0, 1, . . . , 1)}, where (0, 1, ... , 1) denotes the 
tuple with one entry of followed by (k—1) entries of 1. We use the reduction of the previous proof, 
but use 3xS(x, «,...,«) in place of T(v, . . . , v) and 3xS(v, x, . . . , x) in place of F(v, . . . , v). It was 
argued that each sentence used in the previous-case proof is logically equivalent, via syntactic 
transformation, to a boolean combination of primitive positive sentences each using one variable. 
By applying the same syntactic transformations to each sentence used in the present reduction, 
each such sentence is seen to be a boolean combination of primitive positive sentences, each of 
which uses two variables. □ 

The hardness result of Proposition [T2] can be sharply contrasted with the observation, due to 
Vardi [22], that model checking a bounded- variable fragment of first-order logic is polynomial- 
time tractable. Here, tractability is obtained via the natural bottom-up evaluation algorithm that 
computes the satisfying assignments of each sub-formula; observe that each subformula has at most 
\B\ k satisfying assignments when the formula is k- variable and \B\ is the size of the structure's 
universe. 

Proposition 13. (Vardi \2ty ) For each k > 1, the natural evaluation algorithm decides FO-MC(FO fc ) 
in polynomial time. 

A side- by-side comparison of Proposition 1121 with Proposition [13] points to an intriguing state 
of affairs. For k > 2, model checking «EP fc is NP-hard; yet, each sentence <j> in ssEP fc is logi- 
cally equivalent to a sentence <fi' in EP fc , a fragment on which model checking is polynomial-time 
tractable. This implies that such a translation 0—^0' cannot be performed in polynomial time, 
for if it could, performing the translation and then employing Vardi's observation (Proposition 1 131) 
would place model checking of psEP fc in polynomial time! 

Why, then, is this translation not polynomial-time computable? We can distinguish between 
two potential explanations. The first is a computational complexity explanation: for each sentence 
G?aEP' c , there exists an equivalent sentence 4>' G EP fc having length polynomial in (f>. That is, 
short equivalent sentences exist, but they are difficult to compute0The second explanation, based 
on formula length, is that no short equivalent sentences exist, that is, there is no polynomial-length 
translation — s> 0' from wEP fc to EP fe . The following theorem shows that, overwhelmingly, the 
second explanation is the valid one. In particular, the theorem shows that this second explanation 
based on high formula length holds for all signatures other than those having finitely many unary 
symbols; this is proved in a general setting that shows that there is no translation from «EP fc to 
any format that allows for polynomial-time query evaluation. 

To formalize the theorem, we make use of a framework that allows one to discuss and relate the 
compilability of various problems. The framework that we use comes from the work of Cadoli et 
al. [S]; here, we give a self-contained presentation using slightly different terminology and notions. 
Let Q C U* x S* and Q' C S* x U* be decision problems consisting of pairs. We say that Q 



1 This is the situation in primitive positive logic; this can be inferred from Theorem [3] and results in [17] . 



compiles to Q' via the compilation c : S* — > S* if for all pairs (x, y) £ Z"*, it holds that {x, y) £ Q 
if and only if (c(x),y) £ Q'. We say that a map c has constant length if there exists a constant 
rf > 1 such that for all x £ S* , it holds that |c(x)| < d; and, we say that c has polynomial length 
if there exists a polynomial p on the natural numbers such that for all x £ S* , it holds that 
|c(x)| < We say that Q constant- length compiles to Q' if there exists a constant-length map 

c : S* —> S* such that Q compiles to Q' via c, and similarly we say that Q polynomial-length 
compiles to Q' if there exists a polynomial-length map c : S* —> S* such that Q compiles to Q' 
via c. 

Theorem 14. Lei a be a non-empty signature. 

— If a consists of finitely many unary symbols, then EP-MC(EP (T ) is constant-length compilable 
to EP-MC(Epi). 

— Otherwise, there exists k > 1 such that EP-MC(wEP^) is not polynomial-length compilable to a 
polynomial-time decidable problem (assuming that the polynomial hierarchy does not collapse). 
In particular, this holds for k = 1 on a signature having infinitely many unary symbols, and 
for k — 6 on a signature having a symbol of arity 2 or greater. 

As a corollary to this theorem, we obtain that, in the case of a non-compilable signature a (that 
is, a signature falling into the second case of the theorem) and for the k given by the theorem, 
there is no equivalence-preserving translation from «EP fc to any bounded-variable fragment FO™ 1 
of first-order logic. This is formalized as follows. 

Corollary 15. Let a be a signature that does not consist of finitely many unary symbols, and 
assume that the polynomial hierarchy does not collapse. There exists k > 1 such that for all 
m > 1, the following holds: there does not exist a polynomial-length mapping f that gives, for each 
sentence 4> in ~EP*, a logically equivalent sentence f(4>) in FO™ . In particular, this holds for the 
values of k described in the statement of Theorem \14\ 

Proof. Let k be as in the statement of Theorem[TU and let to > 1. By Proposition[T31 the problem 
FO-MC(FO m ) is polynomial-time decidable. So by Theorem [Ml EP-MC(psEP^) is not polynomial- 
length compilable to FO-MC(FO™). The corollary follows. □ 

Our decision to investigate the compilability properties of existential positive logic was partially 
inspired by the interesting discussion of Adler and Weyer [2J. Let «FO fc denote the set containing 
each first-order sentence that is logically equivalent to a sentence in FO fe . Adler and Weyer conjec- 
ture [2j Conjecture 7.1] that a non-elementary length increase is necessary for translating a certain 
fragment of «FO fe (defined in their article) to logically equivalent sentences in FO k . While their 
conjecture concerns elementary versus non-elementary growth, here we have shown (Theorem 1141 
and Corollary I15j) a dichotomy between constant growth and exponential growth for translating 
wEP fe . 

We devote the rest of this section to proving Theorem [14J The bulk of the effort goes into 
establishing the following theorem. Recall that the Directed Hamiltonian Circuit problem is 
the problem of deciding, given a directed graph (assumed here to have n > 2 vertices), whether 
or not it has a directed Hamiltonian circuit, by which is meant an ordering u , . . . of the 

vertices of the graph such that for alH £ {0, . . . , n — 1}, the pair (ui, mod n) is a directed 

edge. This problem is well-known to be NP-complete. 

Theorem 16. Let a be a signature containing a relation symbol E of arity m > 2. There exists a 
sequence {H n } n >2 of sentences in «EP^ such that Directed Hamiltonian Circuit many-one 
polynomial-time reduces to EP-MC({i7„}„>2) via a reduction that (for all n >2) sends a directed 
graph with n vertices to an instance using H n . 

We focus on establishing this theorem in the case of a binary symbol E, and will later indicate 
how this case yields the general case. For each n > 0, let a n be the signature {E, Li, . . . ,L n } 
where E is a binary symbol and the Li are unary symbols. We will call a structure over do = {E} 
a digraph, and a structure over o~ n a labelled digraph when n > 1. 



Let B be a labelled digraph over a n . We define a digraph B* from the labelled digraph B as 
follows. 

For each b G B, we define a gadget digraph G& which has universe 

G b = {b s ,b c ,b d ,b sl , b n , b s2 , b t2 ,..., b sn , 6'™, 6*} U{b m | beLf}U{b m \ b G Lf } 
and edge relation 

E G " = {(b c , 6 s ), (6 C , b d ), (6 s , 6 d ), (b d , b sl )} U {(b s \ b u ) | * e {1, ... , n}}U 
^ 6 « i6a (i+i)j | i e {i j ... jn _i}} U {(^™ i 6*)}u{(6 M ,6 SJ ),(6 t ' 4 ,6* 4 ),(6 OT ,6 m ) | 6GL?}. 
We now define the digraph B*. The universe of this structure is 

B* = (J G b , 

beB 

and the edge relation is 

eB * = ( U ^ Gb ) u {( 6 *> 6 ' s ) i ( 6 > & ') e ^ B }- 

6G-B 

The following lemma gives a key feature of this construction: it gives a translation from labelled 
digraphs to digraphs that strongly preserves the homomorphism relation. 

Lemma 17. Let A,B be labelled digraphs over a n . There exists a homomorphism A — > B if and 
only if there exists a homomorphism A* — >• B*. 

Proof. We first show the forward direction. Suppose that h : A — > B is a homomorphism. We 
define a mapping h* : A* — > B* as follows. For each a G A, the mapping h* is defined so as to 
map G a to G/j( a ) in the natural way, that is, h*(a s ) = h(a) s , h*(a c ) = h(a) c , and so forth. Note 
that if for some i it holds that a m , a m G G a , then by the assumption that h is a homomorphism 
A — > B, it also holds that h(a) m ,h(a) m G Gh( a )- It is straightforward to verify that, for each 
a G A, one has h*(E Ga ) C E Gh ^ . Now consider an edge (a*, a' s ) G -E A * . By definition of A*, one 
has (a, a') G £? A , implying that (h(a),h(a')) G i? 3 , from which one has, by the definition of B*, 
that (h(aY,h(a') s ) £ E B * . 

We now show the backward direction. Suppose that h* : A* — > B* is a homomorphism. We first 
establish the following claim: for all a G A, there exists beB such that (h*(a s ), h*(a c ),h*(a d )) = 
(b s ,b c ,b d ). Let a G A. Observe that h* acts injcctively on {a s ,a c ,a d }, for if not, one would have 
distinct a, a' G {a s ,a c ,a d } with h*(a) = h*{a') and (a, a') G -E A , implying that B* contains 
a self-loop, a contradiction. The subgraph of B* induced by {h(a s ) , h(a c ) 7 h(a d )} thus has three 
vertices and contains the three edges (h(a c ), h(a s )), (h(a c ), h(a d )), (h(a s ),h(a d )); the claim follows 
by inspection of the definition of B*. 

The claim we just established allows us to naturally define, from h* , a mapping h : A — > £>, 
where h(a) = b if and only if (h*(a s ), h*(a c ), h*{a d )) = (b s , 6 C , b d ). We observe that when h(a) = b, 
the A*-path a d , a sl , a n , . . . , a sn , a*", a* is mapped under h* to the B*-path 6 d , b sl , b n , ... , 6 s ", 6*™, 6* 
We have h*(a d ) = b d by the definition of h, and the other equivalences follow immediately from 
the assumption that h* is a homomorphism A* — > B* along with the fact that in each of the two 
paths, each vertex other than the last has outdcgree 1. 

We now verify that ft, is a homomorphism A > B. Suppose that (a, a') G E A , and let (6, b') = 
(h(a),h(a')). By definition of A*, we have (a*,a /s ) G E A * , from which it follows that (6*,6' s ) = 
(/i*(a*),ft*(a' s )) G E B *; by the definition of B*, we have (b,b') G E B . Suppose that a G i A , and 
let 6 = h(a). We want to show that b G £ 3 ; it suffices to show that 6 OT G -B*. We demonstrate this 
by contradiction. Assume b m £ B*. Since a m has an edge to a t% in A*, we have h(a m ) = b s \ as 
by assumption b sl is the only vertex having an edge to b tl in B*. But then, since a vl ,a ul ,a sl is a 
length 2 path in A*, it must bc that h*(a m ) = b si , h* (a ui ) , h* (a si ) = b sl is a length 2 path in B*, 
contradicting that there is no length 2 path from b st to itself in B*. □ 



We define a sequence of existential positive sentences {H n } n >2, as follows. Let n > 2. Let A 
be the labelled digraph on a n with universe {v\, . . . , v n } where E A = and where L A = {v{\ for 
alH € {1, . . . , n}. Let P n ip n be the canonical query Q[A*] of A*, where P n denotes the quantificr 
prchx and ip n the quantihcr- free part. Wo define H n to be the sentence 

n n 

Pn^n A /\(\/E(vl «|))). 
t=l 3=1 

We use C„ to denote the directed cycle on a n where all vertices have all labels, that is, C„ is the 
structure with universe C„ = {0, . . . , n— 1} where E Cn = {(i, (i + 1) mod n) \ i € {0, . . . , n — 1}} 
and where L i " = C„ for alH G {1, ... , n}. 

Lemma 18. Let B be a labelled digraph over a n with n vertices where each vertex is given a 
unique label, that is, where L B U • • • U L B is a partition of the universe B. The digraph (B, E B ) 
has a directed Hamiltonian circuit if and only if it holds that (B x C„)* |= H n . 

Proof. It is straightforward to verify that H n is logically equivalent to the sentence 

V^W-nA /\ E(vj, v 3 m )) 

f i£{l,...,n} 

where the disjunction is over all mappings / : {1, . . . , n} {1, . . . , n}. This sentence is in turn 
straightforwardly verified to be equivalent to 

/ 

where A/ is the labelled digraph on a n with universe {v\, . . . , v n } where E A — {(i>i, ^/(i)) \ i G 
{1, . . . , n}} and L A = {vi} for alH G {!,•■•, n}. We thus obtain that (B x C„)* |= H n if and 
only if there exists a mapping / : {1, . . . , n} — > {1, . . . , n} such that Aj — » (B x C„)*; by appeal 
to Lemma 1171 this latter condition holds if and only if there exists a mapping / : {1, . ..,n} — > 
{1, . . . , n} such that Aj->Bx C„. We make use of this last condition to establish the lemma. 

For the sake of notation, we assume without loss of generality that the structure B has universe 
{vi, . . . , v n } and it holds that if = {v^ for all i G {1, . . . , n}. 

Suppose that B contains a directed Hamiltonian circuit. Then, there exists an operation 
/ : {l,...,n} — > {1, ...,n} such that E A f is a cycle and E A f C E B . The identity mapping 
on . . . , v n } is then a homomorphism Af — > B, and since E A f is a cycle, there exists a homo- 
morphism A/ — > C n . By appeal to Proposition [TJ we have Af — > B x C„. 

Suppose that there exists a mapping / : {1, ... ,71} — > {1, . . . , n} such that Aj->Bx C„. By 
Proposition [TJ it holds that Af — ¥ B and Af — ► C„. We claim that i? A/ is a length n cycle. We 
reason as follows. Fix a vertex w\ arbitrarily. Each vertex has outdegree 1 and hence exactly one 
successor. For i G {1, . . . , n}, define Wi+i to be the successor of un. If the sequence wi,...,w n is 
a listing of all n vertices in . . . , v n } and w n +i = wi, then the claim is established. Otherwise, 
there exist values i, j with l<i<j<n + l and j — i < n such that Wi — Wj. Let i,j be two 
such values such that j — i is minimized. Then Wi, . . . ,Wj is a simple cycle of length strictly less 
than n; its image under the homomorphism from Af to C„ must also be a simple cycle of length 
strictly less than n, contradicting that C„ is a directed cycle of length n. The claim is established. 
Let h be a homomorphism from Af to B. Due to the labels, h is the identity mapping, and so 
since E Af is a length n cycle, we obtain that (B, i? B ) contains a directed Hamiltonian 
circuit. □ 

We now give a lemma that will help us measure the treewidth of a structure having the 
form B*. We will make use of the following auxiliary structure. Let B be a labelled digraph 
over o~ n . We define a digraph B + from B in the following way. The structure B + has universe 
B+ = {b s , b l I b G B} and edge relation £ B+ = {(b s , b l ) b G B} U {(&*, V s ) | (6, b') G E B }. 



Lemma 19. Let B be a labelled digraph over a n . It holds that tw(B*) < max(tw(B + ), 5). 

Proof. We first show that, for each b G B, the gadget digraph Gb has a tree decomposition (T&, /3b) 
of width 5 where b s and 6* are contained in every bag, and where Tb is a path. It suffices to show 
this under the assumption that b G Lf for all i G {1, . . . , n}, since removing 6 from the relations 
translates to removing vertices and edges from Gb. In what follows, we list the bags of this 
tree decomposition in order, and for readability exclude b s and b l from each bag. 

{6 C , b d }, {b d , b sl }, {b sl , b n , b u \ b vl }, {b n ,b s2 }, {b s2 , b t2 ,b u2 , b v2 }, {6 t2 , b s3 }, . . . 

{b sn ,b tn ,b un ,b vn } 

It is straightforwardly verified that each edge of Gb is contained in a bag; since the largest bag 
has 6 elements, the tree decomposition has width 5. 

Now suppose that (T, /3) is a tree decomposition of B + of width tw(B + ). We show how to 
augment this tree decomposition to obtain a tree decomposition of B*. For each 6 £ B, since 
(b s , b r ) G E B , there exists a vertex t of T such that {b s , &*} C (3(t). We adjoin the tree decompo- 
sition (Tb, fib) to t by creating an edge between t and an arbitrary vertex of Tf,. After having done 
this for each b G B, we arrive at a pair (T', /?') which we claim is a tree decomposition of B*. The 
pair (T', /?') satisfies the second condition in the definition of tree decomposition by the definitions 
of B+ and B*, so we consider the first condition of connectivity. A B*-vertex of the form b s or b l 
does not appear in the tree decomposition (T C ,/3 C ) if c ^ b; since such a vertex appears in every 
bag of (Tb,/?b), we obtain the connectivity condition from the connectivity condition holding in 
(T', /3') and by the way we adjoined (Tb, /3b) to (T, (3). Any other B*-vertex has the form b x (with 
X ^ {s,t}) and appears solely in the copy of (T>,/3b) adjoined to (T',/3'); for such a vertex, the 
connectivity condition is thus inherited from the connectivity condition on (T>, /3b)- Since each bag 
in (T',/3') is either a bag in (T,f3) or a bag in (Tb, /3b) for some b G B, we have that the width of 
(T',/3') is equal to max(tw(B + ), 5), yielding the lemma. □ 

With our measuring device (Lemma I19p in hand, we can now bound the number of variables 
needed to express the sentences {H n }. 

Lemma 20. Each existential positive sentence in the sequence {H n } n >2 is logically equivalent to 
a sentence in EP 6 . 

Proof. We will make use of the discussion and notation in the first paragraph of the proof of 
Lemma [18] There, it is shown that H n is logically equivalent to \J j Q[AJ] where the disjunction 
is over all mappings / : {1, . . . , n} — > {1, . . . , n) and Ay is a labelled digraph having the property 
that each vertex has outdegree exactly 1. It follows directly from the definition of A^ that each 
vertex in A^ also has outdegree 1. 

We now argue that any digraph whose vertices each have outdegree 1 has treewidth 2 (or less). 
We prove by induction on the number of vertices that such a digraph has a tree decomposition 
of width less than or equal to 2 where each vertex appears in a bag. Consider the number m of 
vertices having indegree 0. When m = 0, it is straightforward to verify that the digraph is the 
disjoint union of cycles, and has such a tree decomposition. When m > 0, we reason as follows. 
Let v be a vertex with indegree 0, and let v' denote the unique vertex such that (v, v') is an edge 
in the digraph. By induction, the digraph with v removed has a tree decomposition (T, j3) of the 
described form; creating a new vertex u with bag {v,v'} and linking u to a vertex t G T with 
v 1 G (3{t), we obtain the desired tree decomposition. 

Let / : {1, . . . , n} {1, . . . , n} be a mapping. From the just-given argument, we have that 
tw(Ajr) < 2. By Lemma [121 we obtain that tw(AJ) < 5. It follows that tw(core(Ap) < 5, 
since the core of a structure is a substructure thereof and thus cannot have higher treewidth; by 
Theorem |H the sentence Q[AJ] is logically equivalent to a sentence in PP 6 . Since H n is logically 
equivalent to Vf QfAlr], it is thus logically equivalent to a sentence in EP 6 . □ 



Proof. (Proof of Theorem [TB]) We first consider the case that a contains a symbol E of arity 2. By 
Lemma |2"01 the sequence of sentences {H n } n >2 is in «EP^.. By Lemma IT51 the following is a many- 
one polynomial-time reduction from Directed Hamiltonian Circuit to EP-MC({H n } n > 2 )' 
given a directed graph with n vertices, convert this graph to a labelled digraph B over o~ n where 
each vertex is given a unique label; then, output the pair (H n , (B x C„)*). 

We next consider the case that a contains a symbol F of arity r > 2. For each n > 2, let H' n be 
the sentence obtained from H n by replacing each predicate application E(x, y) with the predicate 
application F(x, y, . . . , y), where (x, y, . . . , y) denotes the tuple containing 1 entry of x followed 
by (r — 1) entries of y. Deciding whether or not B |= H n for a structure B is then equivalent to 
deciding whether or not B' |= H' n , where B' is defined by F B ' ={(a,b...,b) \ (a, b) G E B }. With 
this observation, we derive the theorem in this case from the previous case. □ 

Proof. (Proof of Theorem [T¥|) Assume that a consists of finitely many unary symbols. Let <j> 
be any existential positive sentence over a. Convert <f> to the disjunction of primitive positive 
sentences using the syntactic transformations given in the proof of Proposition |3J Each resulting 
primitive positive sentence can be viewed as having the form 3v\ . . . 3u„(7xA. . .A7„) where ji is the 
(possibly empty) conjunction of cr-predicate applications using just the variable Uj. Such a sentence 
is logically equivalent to (Bvi'ji) A • • • A (3w„7„). Each sentence (3vi"/i) is a 1-variable primitive 
positive sentence that is determined by the cr-predicate applications that appear in 7,; call such 
a sentence a little sentence. Note that a boolean combination of little sentences, by renaming of 
variables, is logically equivalent to a 1-variable sentence. We can upper bound the number of little 
sentences, up to logical equivalence, by 2' cr '. Up to logical equivalence, the number of conjunctions 
of little sentences can then be upper bounded by 2 2 ' , and then the number of disjunctions of 

conjunctions of little sentences can be upper bounded by 2 . Hence, each existential positive 
sentence over a is logically equivalent to a boolean combination of little sentences, of which there 
are finitely many, and the first item of the theorem is demonstrated. 

We next consider the case that a contains a symbol E of arity 2 or greater. Suppose that the 
problem EP-MC(«EP^) compiles to a polynomial-time decidable problem Q' C S* x S* via a 
polynomial-length compilation c; we will show that the polynomial hierarchy collapses. Let DCS* 
denote the Directed Hamiltonian Circuit problem under a standard encoding, and let n{x) 
denote the number of vertices in an instance x of D. Let p be a polynomial such that, for each 
instance x of D, it holds that n(x) < p(|x|). Let s be the polynomial-time mapping such that an 
instance x of D is sent to the instance (H n ^, s(x)) by the reduction of Theorem [T51 We have that 
for any string x G S* , it holds that x G D if and only if s(x) \= H n / X \, which in turn holds if and 
only if (c(H n / x \), s(x)) G Q'. There is thus a polynomial-time algorithm that, given an instance 
x of D and the advice strings ciH-^), ■ . ■ , c(H p n x i\), decides if x G D: the algorithm computes 
s(x) and then uses the polynomial-time algorithm for Q' to decide if (c(JJ n ( x )), s(x)) G Q'. The 
advice strings have a total length that is polynomial in x, implying that D is contained in P/poly. 
Since D is NP-complete, this implies that NP is contained in P/poly, which, by the Karp-Lipton 
theorem [16 , implies that the polynomial hierarchy collapses. 

Finally, in the case that a contains infinitely many unary symbols, we give an analog of Theo- 
rem [TBI In particular, we argue that the NP-complete problem Boolean CNF Satisfiability many- 
one polynomial-time reduces to EP-MCdS 1 ™}) where {5,™} is a family of existential positive 
sentences over a, via a reduction that maps an instance having n variables and m clauses to an 
instance involving S™. This suffices, as one can then use the same advice-based argument as in the 
previous case. For n, m > 1, we define S*™ to be the sentence 3vi ■ ■ ■ 3v n Vj=i where 
the Rj are pairwise distinct unary relation symbols in a. Given an instance <f> of satisfiability on 
variables {v%, . . . , v n } and m clauses, the reduction outputs (5,™, B) where B is the structure with 
B = {0, 1} and relations defined as follows. Define (i?j) B = {0} if -*Vi appears as a literal in the 
jth clause, (R l j) B — {1} if v% appears as a literal in the jth clause, and (R l j) B = otherwise. It is 
straightforward to verify that an assignment to the variables satisfies the instance of satisfiability 
if and only if it satisfies the quantifier-free part of 5™ . □ 



5 Discussion 



Our study of parameterized complexity yielded that, under the assumption of bounded arity, a set 
T of existential positive sentences is fixed-parameter tractable when the sentences are equivalent 
to bounded-variable sentences, and W[l]-hard otherwise. In first-order logic in general, equivalence 
of a set of sentences to bounded- variable sentences is sufficient to place model checking in nuFPT. 

Observation 21. Let J- he a set of first- order sentences. If there exists k > 1 such that each 
sentence </> £ T is logically equivalent to a sentence in FO k , then the problem FO-MC(7 r ) is in 
nuFPT. 

Proof. For each sentence <f> £ T, let 4>' denote a logically equivalent sentence in FO fc . One has 
inclusion of FO-MC(J r ) in nuFPT via the ensemble of algorithms {A^^jr where A^, given an 
instance (<fi, B) of FO-MC(J r ), evaluates <fi' on B using the natural bottom- up evaluation algorithm, 
as in Proposition 1131 In particular, the algorithm A^, for each subformula ip of <fi' , computes a 
relation on B of arity less than or equal to k containing the satisfying assignments of ip. □ 

Suppose that J- satisfies the assumption of the observation. The existence of an algorithm that 
passes from a sentence in a set T to an equivalent sentence in FO fe (or, FO m for some fixed 
m > k) would permit one to improve the nuFPT upper bound in the observation to an FPT upper 
bound. We would like to suggest studying for which such sets T such an algorithm exists. 

As borne out by this study as well as others [11I9I13I2TT] , Observation [2T] gives a unifying 
explanation for containment of model checking in nuFPT, under bounded arity. As a direction 
for future research, we would like to propose that this thus-far unifying explanation is the only 
possible explanation for containment in nuFPT in equality-free positive first-order logic. 

Conjecture 22. Let T be a set of equality-free positive relational first-order sentences having 
bounded arity. If there does not exist k > 1 such that each sentence <f> E J- is logically equivalent 
to a sentence in FO fc , then the problem FO-MC(J r ) is W[l]-hard or co-W[l]-hard under nuFPT 
reduction. 

Another interesting research issue is whether or not it is possible to give a reasonable description 
of the existential positive queries (or, more generally, the first-order queries) that can be evaluated 
in polynomial time|l For a set of sentences J- contained in «EP fe , if the passage from a sentence 
4> E J- to an equivalent sentence <j>' E EP fc can be performed in polynomial time, then EP-MC(J r ) 
is in polynomial time: perform the passage and then (as in Proposition 1 1 3[) invoke bottom-up 
evaluation on the resulting EP fc sentence. Also, when this passage is in polynomial time, each 
sentence 4> is clearly equivalent to a polynomial size sentence 4>' E EP fe . We suggested that for 
bounded arity sentences T, inclusion of FO-MC(J r ) in nuFPT is governed by expressibility in 
bounded-variable logic; could inclusion of FO-MC(J r ) in polynomial time be related to expressibility 
in bounded-variable logic via polynomial size sentences? 

Acknowledgements. The author thanks Johan Thapper for useful comments. 
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